22 research outputs found

    Robot-aided cloth classification using depth information and CNNs

    Get PDF
    The final publication is available at link.springer.comWe present a system to deal with the problem of classifying garments from a pile of clothes. This system uses a robot arm to extract a garment and show it to a depth camera. Using only depth images of a partial view of the garment as input, a deep convolutional neural network has been trained to classify different types of garments. The robot can rotate the garment along the vertical axis in order to provide different views of the garment to enlarge the prediction confidence and avoid confusions. In addition to obtaining very high classification scores, compared to previous approaches to cloth classification that match the sensed data against a database, our system provides a fast and occlusion-robust solution to the problem.Peer ReviewedPostprint (author's final draft

    PI-Net: Pose Interacting Network for Multi-Person Monocular 3D Pose Estimation

    Get PDF
    © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Recent literature addressed the monocular 3D pose estimation task very satisfactorily. In these studies, different persons are usually treated as independent pose instances to estimate. However, in many every-day situations, people are interacting, and the pose of an individual depends on the pose of his/her interactees. In this paper, we investigate how to exploit this dependency to enhance current – and possibly future – deep networks for 3D monocular pose estimation. Our pose interacting network, or PI-Net, inputs the initial pose estimates of a variable number of interactees into a recurrent architecture used to refine the pose of the person-of-interest. Evaluating such a method is challenging due to the limited availability of public annotated multi-person 3D human pose datasets. We demonstrate the effectiveness of our method in the MuPoTS dataset, setting the new state-of-the-art on it. Qualitative results on other multi-person datasets (for which 3D pose ground-truth is not available) showcase the proposed PI-Net. PI-Net is implemented in PyTorch and the code will be made available upon acceptance of the paper.Peer ReviewedPostprint (author's final draft

    D-NeRF: neural radiance fields for dynamic scenes

    Get PDF
    © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksNeural rendering techniques combining machine learning with geometric reasoning have arisen as one of the most promising approaches for synthesizing novel views of a scene from a sparse set of images. Among these, stands out the Neural radiance fields (NeRF), which trains a deep network to map 5D input coordinates (representing spatial location and viewing direction) into a volume density and view-dependent emitted radiance. However, despite achieving an unprecedented level of photorealism on the generated images, NeRF is only applicable to static scenes, where the same spatial location can be queried from different images. In this paper we introduce D-NeRF, a method that extends neural radiance fields to a dynamic domain, allowing to reconstruct and render novel images of objects under rigid and non-rigid motions. For this purpose we consider time as an additional input to the system, and split the learning process in two main stages: one that encodes the scene into a canonical space and another that maps this canonical representation into the deformed scene at a particular time. Both mappings are learned using fully-connected networks. Once the networks are trained, D-NeRF can render novel images, controlling both the camera view and the time variable, and thus, the object movement. We demonstrate the effectiveness of our approach on scenes with objects under rigid, articulated and non-rigid motions.This work is supported in part by a Google Daydream Research award and by the Spanish government with the project HuMoUR TIN2017-90086-R, the ERA-Net Chistera project IPALM PCI2019-103386 and María de Maeztu Seal of Excellence MDM-2016- 0656. Gerard Pons-Moll is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 409792180 (Emmy Noether Programme, project: Real Virtual Humans).Peer ReviewedPostprint (author's final draft

    Enhancing egocentric 3D pose estimation with third person views

    Get PDF
    © 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-NDWe propose a novel approach to enhance the 3D body pose estimation of a person computed from videos captured from a single wearable camera. The main technical contribution consists of leveraging high-level features linking first- and third-views in a joint embedding space. To learn such embedding space we introduce First2Third-Pose, a new paired synchronized dataset of nearly 2000 videos depicting human activities captured from both first- and third-view perspectives. We explicitly consider spatial- and motion-domain features, combined using a semi-Siamese architecture trained in a self-supervised fashion. Experimental results demonstrate that the joint multi-view embedded space learned with our dataset is useful to extract discriminatory features from arbitrary single-view egocentric videos, with no need to perform any sort of domain adaptation or knowledge of camera parameters. An extensive evalu- ation demonstrates that we achieve significant improvement in egocentric 3D body pose estimation per- formance on two unconstrained datasets, over three supervised state-of-the-art approaches. The collected dataset and pre-trained model are available for research purposes.This work has been partially supported by projects PID2020-120 049RB-I00 and PID2019-110977GA-I00 funded by MCIN/ AEI/10.13039/501100 011033 and by the “European Union NextGener-ationEU/PRTR”, as well as by grant RYC-2017-22563 funded by MCIN/ AEI /10.13039/501100 011033 and by “ESF Investing in your future”, and network RED2018-102511-T funded by MCIN/ AEIPeer ReviewedPostprint (published version

    SMPLicit: Topology-aware generative model for clothed people

    Get PDF
    © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksIn this paper we introduce SMPLicit, a novel generative model to jointly represent body pose, shape and clothing geometry. In contrast to existing learning-based approaches that require training specific models for each type of garment, SMPLicit can represent in a unified manner different garment topologies (eg from sleeveless tops to hoodies and to open jackets), while controlling other properties like the garment size or tightness/looseness. We show our model to be applicable to a large variety of garments including T-shirts, hoodies, jackets, shorts, pants, skirts, shoes and even hair. The representation flexibility of SMPLicit builds upon an implicit model conditioned with the SMPL human body parameters and a learnable latent space which is semantically interpretable and aligned with the clothing attributes. The proposed model is fully differentiable, allowing for its use into larger end-to-end trainable systems. In the experimental section, we demonstrate SMPLicit can be readily used for fitting 3D scans and for 3D reconstruction in images of dressed people. In both cases we are able to go beyond state of the art, by retrieving complex garment geometries, handling situations with multiple clothing layers and providing a tool for easy outfit editing. To stimulate further research in this direction, we will make our code and model publicly available at http://www.iri.upc.edu/people/ecorona/smplicit/.his work is supported in part by an Amazon Research Award and by the Spanish government with the projects HuMoUR TIN2017-90086-R and María de Maeztu Seal of Excellence MDM2016-0656. Gerard Pons-Moll is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 409792180 (Emmy Noether Programme, project: Real Virtual Humans)Peer ReviewedPostprint (published version

    Multi-FinGAN: generative coarse-to-fine sampling of multi-finger grasps

    Get PDF
    © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting /republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksWhile there exists many methods for manipulating rigid objects with parallel-jaw grippers, grasping with multi- finger robotic hands remains a quite unexplored research topic. Reasoning and planning collision-free trajectories on the additional degrees of freedom of several fingers represents an important challenge that, so far, involves computationally costly and slow processes. In this work, we present Multi-FinGAN, a fast generative multi-finger grasp sampling method that synthesizes high quality grasps directly from RGB-D images in about a second. We achieve this by training in an end-to-end fashion a coarse-to-fine model composed of a classification network that distinguishes grasp types according to a specific taxonomy and a refinement network that produces refined grasp poses and joint angles. We experimentally validate and benchmark our method against a standard grasp-sampling method on 790 grasps in simulation and 20 grasps on a real Franka Emika Panda. All experimental results using our method show consistent improvements both in terms of grasp quality metrics and grasp success rate. Remarkably, our approach is up to 20-30 times faster than the baseline, a significant improvement that opens the door to feedback-based grasp re-planning and task informative grasping. Code is available at https://irobotics.aalto.fi/multi-fingan/.Peer ReviewedPostprint (author's final draft

    Using CNNs to classify and grasp cloth garments

    Get PDF
    A degree thesis submitted to the Faculty of Escola Tècnica d’Enginyeria de Telecomunicació de Barcelona, Universitat Politècnica de Catalunya.[CA]: La manipulació i identificació d’objectes deformables actualment es considera un dels problemes més ambiciosos en l’àmbit de la robòtica. A causa de la seva forma i posició imprevisibles, és molt difícil reconèixer-los i identificar les seves parts més rellevants. L’objectiu d’aquest projecte es divideix en dues parts. Primer, reconèixer una peça de roba entre quatre models prèviament definits. I segon, buscar punts adients per agafar la roba, per tal de portar-la des d’una posició aleatòria a una configuració coneguda. Ambdues tasques es solucionen mitjançant Xarxes Neuronals Convolucionals (CNNs) entrenades amb imatges de profunditat reals i sintèticament generades. Hem desenvolupat un procés per detectar, després d’identificar la peça de roba, dos punts prèviament definits per agafar cada peça de roba. Una CNN prediu la visibilitat i posició dels dos punts, per saber si girar la roba o agafar-la. Un cop agafat el primer, el segon punt es prediu de forma semblant amb una CNN més especialitzada.[ES]: La manipulación e identificación de objetos deformables actualmente se considera uno de los problemas más ambiciosos en el ámbito de la robótica. Debido a su forma y posición imprevisibles, reconocerlos e identificar sus partes más relevantes es muy difícil. El objetivo de este proyecto se divide en dos partes. Primero, reconocer una prenda de ropa entre cuatro modelos previamente definidos. Y segundo, buscar puntos adecuados para coger la ropa, para traerla desde una posición aleatoria a una configuración conocida. Ambas tareas se solucionan mediante Redes Neuronales Convolucionales (CNNs) entrenadas con imágenes de profundidad reales y sintéticamente generadas. Hemos desarrollado un proceso para detectar, después de identificar la prenda de ropa, dos puntos previamente definidos para coger cada prenda de ropa. Una CNN predice la visibilidad y posición de los dos puntos, para saber si girar la ropa o cogerla. Una vez cogido el primer punto, el segundo punto se predice de forma parecida con una CNN más especializada.[EN]: Identification and manipulation of deformable objects are currently considered as one of the most challenging tasks in the field of robotics. Their unpredictable shape and pose makes it very difficult to identify them and retrieve their most relevant parts. The aim of this project is divided in two tasks. First, to recognize a garment between four previously modeled types. And second, to search for suitable grasping points in order to bring the cloth from its initial random position to a known configuration. Both tasks are solved using Convolutional Neural Networks (CNNs) trained with both real and synthetically generated clothing depth images. We developed a method to detect, after the garment is recognized, two garment-based prede- fined grasping points. A CNN is used to predict their visibility and position, choosing between rotating or grasping the garment. Once grasped the first, the second point is predicted similarly with a more specialized CNN.Peer Reviewe

    Context-aware human modelling

    Get PDF
    Aquest document resumeix el pla de recerca que seguirem durant el doctorat.Preprin

    Utilitzant CNNs per classificar i manipular peces de roba

    No full text
    Robots are getting more autonomous everyday, but it is still hard for them to work with deformable objects such as cloth garments. Due to the changing nature of these objects, robots have to deal with unknown situations for them. In the case of clothing, before the robots can perform tasks with them (Dressing a person, folding clothes...) they should be capable of identifying the garment and grasping it by a known configuration. This lets them manipulate each garment the expected way.La manipulació i identificació d’objectes deformables actualment es considera un dels problemes més ambiciosos en l’àmbit de la robòtica. A causa de la seva forma i posició imprevisibles, és molt difícil reconèixer-los i identificar les seves parts més rellevants. L’objectiu d’aquest projecte es divideix en dues parts. Primer, reconèixer una peça de roba entre quatre models prèviament definits. I segon, buscar punts adients per agafar la roba, per tal de portar-la des d’una posició aleatòria a una configuració coneguda. Ambdues tasques es solucionen mitjançant Xarxes Neuronals Convolucionals (CNNs) entrenades amb imatges de profunditat reals i sintèticament generades. Hem desenvolupat un procés per detectar, després d’identificar la peça de roba, dos punts prèviament definits per agafar cada peça de roba. Una CNN prediu la visibilitat i posició dels dos punts, per saber si girar la roba o agafar-la. Un cop agafat el primer, el segon punt es prediu de forma semblant amb una CNN més especialitzada.La manipulación e identificación de objetos deformables actualmente se considera uno de los problemas más ambiciosos en el ámbito de la robótica. Debido a su forma y posición imprevisibles, reconocerlos e identificar sus partes más relevantes es muy difícil. El objetivo de este proyecto se divide en dos partes. Primero, reconocer una prenda de ropa entre cuatro modelos previamente definidos. Y segundo, buscar puntos adecuados para coger la ropa, para traerla desde una posición aleatoria a una configuración conocida. Ambas tareas se solucionan mediante Redes Neuronales Convolucionales (CNNs) entrenadas con imágenes de profundidad reales y sintéticamente generadas. Hemos desarrollado un proceso para detectar, después de identificar la prenda de ropa, dos puntos previamente definidos para coger cada prenda de ropa. Una CNN predice la visibilidad y posición de los dos puntos, para saber si girar la ropa o cogerla. Una vez cogido el primer punto, el segundo punto se predice de forma parecida con una CNN más especializada.Identification and manipulation of deformable objects are currently considered as one of the most challenging tasks in the field of robotics. Their unpredictable shape and pose makes it very difficult to identify them and retrieve their most relevant parts. The aim of this project is divided in two tasks. First, to recognize a garment between four previously modeled types. And second, to search for suitable grasping points in order to bring the cloth from its initial random position to a known configuration. Both tasks are solved using Convolutional Neural Networks (CNNs) trained with both real and synthetically generated clothing depth images. We developed a method to detect, after the garment is recognized, two garment-based prede- fined grasping points. A CNN is used to predict their visibility and position, choosing between rotating or grasping the garment. Once grasped the first, the second point is predicted similarly with a more specialized CNN

    3D pose estimation for symmetric and nonsymmetric objects

    No full text
    Treball fi de màster de: Master in Intelligent Interactive SystemsSupervisor: Sanja Fidler; Co-Supervisor: Coloma BallesterAutonomous systems have to understand the 3D space, being able to detect objects and infer their pose to pick them and reliably perform a certain goal-oriented action. An increasing number of works focus on this topic motivated by self-driving cars or the Amazon picking challenge. In particular, we focus on pose estimation, a well-known problem in computer vision and robotics which is essential for object manipulation. This requires reliable identification of object poses to know how to pick them up in order to interact with them for a certain goal. Additionally, pose estimation involves several difficulties. Objects may have rotational symmetries or their appearance can vary significantly depending on the lighting or occlusions. A common approach to pose estimation is to first estimate a coarse pose to initialize ICP and get a fine pose estimation. We follow this idea in this work by comparing an object in an RGB-D setting to a set of views of the same CAD model obtained offline. Using Convolutional Neural Networks, we embed the images to a common space where they can be efficiently compared. Additionally, we propose to consider symmetries directly in the comparison to avoid inconsistencies in the pose estimation. Given the lack of benchmarks with symmetric objects for pose estimation, we obtain 6669 CAD models of very different kinds and generate realistic simulations of tabletop scenarios to train and test our approach. We also leverage a non-published dataset of real objects with symmetries. Finally, we infer rotational symmetries in new CAD models, obtaining a high recall and promising results that suggest further research
    corecore